Skip to content

perf: optimize core hot paths (chain, context, binding, responses)#3008

Open
vishr wants to merge 3 commits into
masterfrom
perf/hot-paths-and-sonic-serializer
Open

perf: optimize core hot paths (chain, context, binding, responses)#3008
vishr wants to merge 3 commits into
masterfrom
perf/hot-paths-and-sonic-serializer

Conversation

@vishr

@vishr vishr commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

Optimizes Echo's per-request hot paths to remove avoidable allocations and CPU work. No public API changes; the standard-library JSON serializer remains the default. All numbers are benchstat medians (n=8, Apple M3 Max / arm64, Go 1.26).

Note (per @aldas review): the opt-in sonic serializer was removed from this PR — it belongs in echox/cookbook as a runnable example, not as a submodule in core. This PR is now purely core hot-path optimizations. See "Using a faster JSON encoder" below.

What changed

Core

  • Middleware chain compiled once (echo.go, buildRouterChains) and reused, instead of re-wrapping closures on every request. Routing stores the matched handler on the Context.
  • Context (context.go): zero-copy String/HTML/JSONP writes (write-only unsafe view), reuse of delayedStatusWriter (guarded against re-entrant c.JSON) and the store map across requests, inline Get/Set unlock, and a single-key QueryParam fast path proven byte-for-byte equal to url.ParseQuery().Get (incl. malformed escapes / ; / +).
  • Binder (bind.go): per-reflect.Type field-metadata cache so struct tags are parsed once per type, not per request. Preserves the field-name error wrapping from fix(binder): include field name in bind conversion errors (#2629) #3005.
  • Middleware: precompute the HSTS header once (secure.go); pool the request-ID randomString scratch buffers (util.go).
  • New hot-path benchmark suite + pooling/dispatch regression tests.
  • test: de-flake TestStartConfig_WithListenerNetwork (ephemeral port instead of a hard-coded one) — separable commit; fixes a pre-existing CI flake.

Performance (before → after)

Path Before After Δ time Allocs
5-middleware request 101 ns 34 ns −66% 5 → 0
Set per request (1 map alloc) 0 allocs 1 → 0
QueryParam (single key) 199 ns 41 ns −79% 4 → 0
String() response 191 ns 188 ns flat 4 → 3
JSON() response 347 ns 350 ns flat 5 → 4
Bind query (5 fields) 961 ns 688 ns −28% 8
bindData w/ tags 4973 ns 2609 ns −48%
request-ID gen 130 ns 122 ns −6% 2 → 1 (−60% B)
Static / Param route 27 / 42 ns 27 / 43 ns flat 0

Headline: the middleware path and the Set/QueryParam paths are now allocation-free; binding is 28–48% faster.

Router — profiled, intentionally untouched

-cpuprofile shows the router is already 0 allocs/op, with time dominated by the irreducible LCP byte-loop (58%) and method switch (11%). I implemented the httprouter indices/IndexByte trick for findStaticChild and measured a 30–37% regression on hits — Echo's nodes have small fan-out, where the inlined linear scan beats a non-inlined IndexByte call — so it was reverted. No router change.

Using a faster JSON encoder (e.g. sonic)

This PR does not bundle sonic. The echo.JSONSerializer interface already lets any app swap encoders in ~10 lines:

import "github.com/bytedance/sonic"

type sonicJSON struct{}
func (sonicJSON) Serialize(c *echo.Context, v any, _ string) error {
	b, err := sonic.Marshal(v); if err != nil { return err }
	_, err = c.Response().Write(b); return err
}
func (sonicJSON) Deserialize(c *echo.Context, v any) error {
	return sonic.ConfigDefault.NewDecoder(c.Request().Body).Decode(v)
}
// e.JSONSerializer = sonicJSON{}

Measured (this machine, arm64): sonic decode −44% (a clear win on any arch), encode +43% (arm64 is sonic's weak arch; usually a win on amd64). A full cookbook example with these caveats will be a separate PR to labstack/echox.

Testing

  • go test ./... + -race pass; gofmt + go vet clean.
  • Added: store no-leak across Reset, JSON status across Reset, nested c.JSON, global/pre middleware on 404/405/OPTIONS, randomString concurrency, query fast-path stdlib-equivalence.

Comment thread sonic/README.md Outdated
@@ -0,0 +1,79 @@
# Echo sonic JSON serializer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just an example in cookbook https://echo.labstack.com/docs/category/cookbook example. I think PR to https://github.com/labstack/echox/tree/master/cookbook would make more sense than adding this submodule

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — removed the sonic submodule from this PR. It'll go to echox/cookbook as a runnable example (with the decode-wins / arm64-encode caveat that's the genuinely useful part). This PR is now purely core hot-path perf. Thanks!

@aldas

aldas commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

@vishr , please look at your emails and respond me.

- echo: compile global/pre middleware chains once instead of per request,
  eliminating per-request closure allocations (5 mw: 101ns/5allocs -> 34ns/0allocs)
- context: zero-copy String/HTML/JSONP writes, reuse delayedStatusWriter (guarded
  against re-entrant c.JSON) and the store map across requests, drop deferred unlock
  on Get/Set, single-key QueryParam fast path (199ns/4allocs -> 41ns/0allocs)
- bind: cache per-type struct field metadata (bindData -48%, query Bind -28%)
- add hot-path benchmark suite and pooling/dispatch regression tests

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vishr vishr force-pushed the perf/hot-paths-and-sonic-serializer branch from 96d6496 to 34b33be Compare June 14, 2026 01:01
@vishr vishr changed the title perf: optimize core hot paths + add opt-in sonic JSON serializer perf: optimize core hot paths (chain, context, binding, responses) Jun 14, 2026
vishr and others added 2 commits June 13, 2026 18:05
The 4 sub-tests reused a hard-coded port (1323) sequentially; the next bind
raced with the prior server's shutdown/socket release and failed on CI. Use
:0 and dial the address reported by ListenerAddrFunc, preserving the
network-family (tcp/tcp4/tcp6) intent without the fixed-port race.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t, test cached bind errors

- context: correct newContext comment (Reset clears the store map, no longer nils it);
  document that only json()'s nested-guard may point the response at &c.dsw
- test: deterministic cold-then-warm bind ensures the per-type cache preserves
  field-name conversion errors regardless of suite ordering

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aldas

aldas commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

@vishr , please check your emails: v@labstack.com and vr@labstack.com

@vishr

vishr commented Jun 14, 2026

Copy link
Copy Markdown
Member Author

@vishr , please check your emails: v@labstack.com and vr@labstack.com

I did. Are you referring to these comments or something else?

@aldas

aldas commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

about May 23/26/27 2026 emails to v and vr

@vishr

vishr commented Jun 14, 2026

Copy link
Copy Markdown
Member Author

Yes, let me reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants